{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "6a46e634",
   "metadata": {},
   "source": [
    "Copyright (c) 2021, salesforce.com, inc. \\\n",
    "All rights reserved. \\\n",
    "SPDX-License-Identifier: BSD-3-Clause \\\n",
    "For full license text, see the LICENSE file in the repo root or https://opensource.org/licenses/BSD-3-Clause"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "492ea6cc",
   "metadata": {},
   "source": [
    "**Try this notebook on [Colab](http://colab.research.google.com/github/salesforce/warp-drive/blob/master/tutorials/tutorial-3-warp_drive_reset_and_log.ipynb)!**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bd550365",
   "metadata": {},
   "source": [
    "## ⚠️ PLEASE NOTE:\n",
    "This notebook runs on a GPU runtime.\\\n",
    "If running on Colab, choose Runtime > Change runtime type from the menu, then select 'GPU' in the dropdown."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6a3dbde4",
   "metadata": {},
   "source": [
    "# Welcome to WarpDrive!"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e931567e",
   "metadata": {},
   "source": [
    "This is our third (and an advanced) tutorial about WarpDrive, a framework for extremely parallelized multi-agent reinforcement learning (RL) on a single GPU. If you haven't yet, please also checkout our previous tutorials\n",
    "\n",
    "- [WarpDrive basics](https://www.github.com/salesforce/warp-drive/blob/master/tutorials/tutorial-1-warp_drive_basics.ipynb)\n",
    "- [WarpDrive sampler](https://www.github.com/salesforce/warp-drive/blob/master/tutorials/tutorial-2-warp_drive_sampler.ipynb)\n",
    "\n",
    "In this tutorial, we describe **CUDAEnvironmentReset** and **CUDALogController**. \n",
    "\n",
    "- CUDAEnvironmentReset works exclusively on the GPU to reset the environment in-place. \n",
    "- CUDALogController works exclusively in the GPU device to log the episode history. \n",
    "\n",
    "They both play important roles in the WarpDrive framework."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "820e576a",
   "metadata": {},
   "source": [
    "## Dependencies"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9c4faf10",
   "metadata": {},
   "source": [
    "You can install the warp_drive package using\n",
    "\n",
    "- the pip package manager, OR\n",
    "- by cloning the warp_drive package and installing the requirements.\n",
    "\n",
    "On Colab, we will do the latter."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "23e7666f",
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys\n",
    "IN_COLAB = 'google.colab' in sys.modules\n",
    "\n",
    "if IN_COLAB:\n",
    "    ! git clone https://github.com/salesforce/warp-drive.git \n",
    "    % cd warp-drive\n",
    "    ! pip install -e .\n",
    "else:\n",
    "    ! pip install rl_warp_drive"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "83822a0e",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import torch\n",
    "from warp_drive.managers.data_manager import CUDADataManager\n",
    "from warp_drive.managers.function_manager import CUDAFunctionManager, CUDALogController, CUDAEnvironmentReset\n",
    "from warp_drive.utils.constants import Constants\n",
    "from warp_drive.utils.data_feed import DataFeed\n",
    "from warp_drive.utils.common import get_project_root\n",
    "\n",
    "_CUBIN_FILEPATH = f\"{get_project_root()}/warp_drive/cuda_bin\"\n",
    "_ACTIONS = Constants.ACTIONS"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c3f0668e",
   "metadata": {},
   "source": [
    "## CUDAEnvironmentReset and CUDALogController"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7d2401f7",
   "metadata": {},
   "source": [
    "Assuming you have developed a CUDA environment `step` function, here we show how WarpDrive can help to facilitate the environment rollout by resetting and logging the environment on the GPU. If you do not have \"test_build.cubin\" built, you can refer to the previous tutorial [WarpDrive sampler](https://www.github.com/salesforce/warp-drive/blob/master/tutorials/tutorial-2-warp_drive_sampler.ipynb) about how to automatically build it. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "78b71dda",
   "metadata": {},
   "outputs": [],
   "source": [
    "cuda_data_manager = CUDADataManager(num_agents=5, num_envs=2, episode_length=2)\n",
    "cuda_function_manager = CUDAFunctionManager(num_agents=cuda_data_manager.meta_info(\"n_agents\"),\n",
    "                                            num_envs=cuda_data_manager.meta_info(\"n_envs\"))\n",
    "cuda_function_manager.load_cuda_from_binary_file(f\"{_CUBIN_FILEPATH}/test_build.fatbin\")\n",
    "cuda_env_resetter = CUDAEnvironmentReset(function_manager=cuda_function_manager)\n",
    "cuda_env_logger = CUDALogController(function_manager=cuda_function_manager)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fcc0f37f",
   "metadata": {},
   "source": [
    "## Step Function"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f572c491",
   "metadata": {},
   "source": [
    "We have an example step function already checked in and compiled inside `test_build.cubin`. \n",
    "\n",
    "The source code of this dummy step function can be found [here](https://www.github.com/salesforce/warp-drive/blob/master/example_envs/dummy_env/test_step.cu). For each step, array `x` will be divided by `multiplier` while array `y` will be multiplied by the same `multiplier`:\n",
    "\n",
    "```\n",
    "x[index] = x[index] / multiplier;\n",
    "y[index] = y[index] * multiplier;\n",
    "```\n",
    "\n",
    "Now we just need to initialize it with CUDAFunctionManager and wrap up it with a Python/CUDA step callable. In `dummy_env` this function is called `cuda_dummy_step()`. \n",
    "\n",
    "Notice that we provide the **EnvWrapper** to wrap up most of processes below automatically. However, the unique Python/CUDA step callable you developed needs to be defined inside your environment so **EnvWrapper** can find and wrap it up. \n",
    "\n",
    "For concrete examples on how to define more complex `step` functions, you can refer to [example1](https://www.github.com/salesforce/warp-drive/blob/master/example_envs/tag_gridworld/tag_gridworld_step.cu) and [example2](https://www.github.com/salesforce/warp-drive/blob/master/example_envs/tag_continous/tag_continuous_step.cu)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7cbb51ca",
   "metadata": {},
   "outputs": [],
   "source": [
    "cuda_function_manager.initialize_functions([\"testkernel\"])\n",
    "\n",
    "def cuda_dummy_step(function_manager: CUDAFunctionManager,\n",
    "                    data_manager: CUDADataManager,\n",
    "                    env_resetter: CUDAEnvironmentReset,\n",
    "                    target: int,\n",
    "                    step: int):\n",
    "\n",
    "    env_resetter.reset_when_done(data_manager)\n",
    "\n",
    "    step = np.int32(step)\n",
    "    target = np.int32(target)\n",
    "    test_step = function_manager._get_function(\"testkernel\")\n",
    "    test_step(data_manager.device_data(\"X\"),\n",
    "              data_manager.device_data(\"Y\"),\n",
    "              data_manager.device_data(\"_done_\"),\n",
    "              data_manager.device_data(f\"{_ACTIONS}\"),\n",
    "              data_manager.device_data(\"multiplier\"),\n",
    "              target,\n",
    "              step,\n",
    "              data_manager.meta_info(\"episode_length\"),\n",
    "              block=function_manager.block,\n",
    "              grid=function_manager.grid)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7dc7494d",
   "metadata": {},
   "source": [
    "## Reset and Log Function"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b7eab014",
   "metadata": {},
   "source": [
    "In the `step` function above, besides the step function managed by CUDAFunctionManager, you can see the function called `CUDAEnvironmentReset.reset_when_done()`. This function will reset the corresponding env to its initial state when the `done` flag becomes true on the GPU. This reset only resets the env that is done. \n",
    "\n",
    "To make it work properly, you need to specify which data (usually the feature arrays and observations) can be reset. \n",
    "\n",
    "This is where the flag **save_copy_and_apply_at_reset** comes into play. If the data has `save_copy_and_apply_at_reset` set to True, a dedicated copy will be maintained in the device for resetting. \n",
    "\n",
    "On the other hand, **log_data_across_episode** will create a buffer on the GPU for logs. This lets you record a complete episode. \n",
    "\n",
    "These two functions can be independently used!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1f9902ef",
   "metadata": {},
   "outputs": [],
   "source": [
    "data = DataFeed()\n",
    "data.add_data(name=\"X\", data=[[0.1, 0.2, 0.3, 0.4, 0.5],\n",
    "                              [0.6, 0.7, 0.8, 0.9, 1.0]],\n",
    "              save_copy_and_apply_at_reset=True,\n",
    "              log_data_across_episode=True)\n",
    "\n",
    "data.add_data(name=\"Y\", data=np.array([[6, 7, 8, 9, 10],\n",
    "                                       [1, 2, 3, 4, 5]]),\n",
    "              save_copy_and_apply_at_reset=True,\n",
    "              log_data_across_episode=True)\n",
    "data.add_data(name=\"multiplier\", data=2.0)\n",
    "\n",
    "tensor = DataFeed()\n",
    "tensor.add_data(name=f\"{_ACTIONS}\", data=[[[0, 0, 0],[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0]],\n",
    "                                      [[0, 0, 0],[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0]]])\n",
    "\n",
    "cuda_data_manager.push_data_to_device(data)\n",
    "cuda_data_manager.push_data_to_device(tensor, torch_accessible=True)\n",
    "\n",
    "assert cuda_data_manager.is_data_on_device(\"X\")\n",
    "assert cuda_data_manager.is_data_on_device(\"Y\")\n",
    "assert cuda_data_manager.is_data_on_device_via_torch(f\"{_ACTIONS}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6563f5b1",
   "metadata": {},
   "source": [
    "Now, we run an complete set of parallel episodes and inspect the log for the first environment."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b5501aba",
   "metadata": {},
   "source": [
    "## Test Run"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f6fd89e9",
   "metadata": {},
   "outputs": [],
   "source": [
    "# t = 0 is reserved for the initial state.\n",
    "cuda_env_logger.reset_log(data_manager=cuda_data_manager, env_id=0)\n",
    "\n",
    "for t in range(1, cuda_data_manager.meta_info(\"episode_length\") + 1):\n",
    "    cuda_dummy_step(function_manager=cuda_function_manager,\n",
    "                    data_manager=cuda_data_manager,\n",
    "                    env_resetter=cuda_env_resetter,\n",
    "                    target=100,\n",
    "                    step=t)\n",
    "    cuda_env_logger.update_log(data_manager=cuda_data_manager, step=t)\n",
    "\n",
    "dense_log = cuda_env_logger.fetch_log(data_manager=cuda_data_manager, names=[\"X\", \"Y\"])\n",
    "\n",
    "# Test after two steps that the log buffers for X and Y log are updating.\n",
    "X_update = dense_log[\"X_for_log\"]\n",
    "Y_update = dense_log[\"Y_for_log\"]\n",
    "\n",
    "assert(abs(X_update[1].mean()-0.15) < 1e-5)\n",
    "assert(abs(X_update[2].mean()-0.075) < 1e-5)\n",
    "assert(Y_update[1].mean()==16)\n",
    "assert(Y_update[2].mean()==32)\n",
    "\n",
    "# Right now, the reset functions have not been activated.\n",
    "# The done flags should be all True now.\n",
    "\n",
    "done = cuda_data_manager.pull_data_from_device(\"_done_\")\n",
    "print(f\"The done array = {done}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a2343191",
   "metadata": {},
   "source": [
    "For this demo, we can explicitly reset the environment to see how it works. The `dummy_step` function will do this in the next step by itself as well. After resetting, you can see that all the done flags go back to False and the `X` and `Y` arrays get reset successfully as well."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "96940e56",
   "metadata": {},
   "outputs": [],
   "source": [
    "cuda_env_resetter.reset_when_done(data_manager=cuda_data_manager)\n",
    "\n",
    "done = cuda_data_manager.pull_data_from_device(\"_done_\")\n",
    "assert(done[0]==0)\n",
    "assert(done[1]==0)\n",
    "\n",
    "X_after_reset = cuda_data_manager.pull_data_from_device(\"X\")\n",
    "Y_after_reset = cuda_data_manager.pull_data_from_device(\"Y\")\n",
    "# the 0th dim is env\n",
    "assert(abs(X_after_reset[0].mean()-0.3) < 1e-5)\n",
    "assert(abs(X_after_reset[1].mean()-0.8) < 1e-5)\n",
    "assert(Y_after_reset[0].mean()==8)\n",
    "assert(Y_after_reset[1].mean()==3)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c7be3887",
   "metadata": {},
   "source": [
    "# Learn More and Explore our Tutorials!"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d3763b06",
   "metadata": {},
   "source": [
    "Now that you have familiarized yourself with WarpDrive, we suggest you take a look at our tutorials on [creating custom environments](https://www.github.com/salesforce/warp-drive/blob/master/tutorials/tutorial-4-create_custom_environments.ipynb) and on how to use WarpDrive to perform end-to-end multi-agent reinforcement learning [training](https://www.github.com/salesforce/warp-drive/blob/master/tutorials/tutorial-5-training_with_warp_drive.ipynb)!\n",
    "\n",
    "For your reference, all our tutorials are here:\n",
    "- [A simple end-to-end RL training example](https://www.github.com/salesforce/warp-drive/blob/master/tutorials/simple-end-to-end-example.ipynb)\n",
    "- [WarpDrive basics](https://www.github.com/salesforce/warp-drive/blob/master/tutorials/tutorial-1-warp_drive_basics.ipynb)\n",
    "- [WarpDrive sampler](https://www.github.com/salesforce/warp-drive/blob/master/tutorials/tutorial-2-warp_drive_sampler.ipynb)\n",
    "- [WarpDrive reset and log](https://www.github.com/salesforce/warp-drive/blob/master/tutorials/tutorial-3-warp_drive_reset_and_log.ipynb)\n",
    "- [Creating custom environments](https://www.github.com/salesforce/warp-drive/blob/master/tutorials/tutorial-4-create_custom_environments.ipynb)\n",
    "- [Training with WarpDrive](https://www.github.com/salesforce/warp-drive/blob/master/tutorials/tutorial-5-training_with_warp_drive.ipynb)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
